Search CORE

10 research outputs found

Specialising Parsers for Queries

Author: Jonnalagedda Manohar
Publication venue: Lausanne, EPFL
Publication date: 09/11/2016
Field of study

Many software systems consist of data processing components that analyse large datasets to gather information and learn from these. Often, only part of the data is relevant for analysis. Data processing systems contain an initial preprocessing step that filters out the unwanted information. While efficient data analysis techniques and methodologies are accessible to non-expert programmers, data preprocessing seems to be forgotten, or worse, ignored. This despite real performance gains being possible by efficiently preprocessing data. Implementations of the data preprocessing step traditionally have to trade modularity for performance: to achieve the former, one separates the parsing of raw data and filtering it, and leads to slow programs because of the creation of intermediate objects during execution. The efficient version is a low-level implementation that interleaves parsing and querying. In this dissertation we demonstrate a principled and practical technique to convert the modular, maintainable program into its interleaved efficient counterpart. Key to achieving this objective is the removal, or deforestation, of intermediate objects in a program execution. We first show that by encoding data types using Böhm-Berarducci encodings (often referred to as Church encodings), and combining these with partial evaluation for function composition we achieve deforestation. This allows us to implement optimisations themselves as libraries, with minimal dependence on an underlying optimising compiler. Next we illustrate the applicability of this approach to parsing and preprocessing queries. The approach is general enough to cover top-down and bottom-up parsing techniques, and deforestation of pipelines of operations on lists and streams. We finally present a set of transformation rules that for a parser on a nested data format and a query on the structure, produces a parser specialised for the query. As a result we preserve the modularity of writing parsers and queries separately while also minimising resource usage. These transformation rules combine deforested implementations of both libraries to yield an efficient, interleaved result

Infoscience - École polytechnique fédérale de Lausanne

Fold-based fusion as a library: a generative programming pearl

Author: Jonnalagedda Manohar
Stucki Sandro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/06/2015
Field of study

Fusion is a program optimisation technique commonly implemented using special-purpose compiler support. In this paper, we present an alternative approach, implementing fold-based fusion as a standalone library. We use staging to compose operations on folds; the operations are partially evaluated away, yielding code that does not construct unnecessary intermediate data structures. The technique extends to partitioning and grouping of collections

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Accelerating parser combinators with macros

Author: Béguet Eric
Jonnalagedda Manohar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Parser combinators provide an elegant way of writing parsers: parser implementations closely follow the structure of the underlying grammar, while accommodating interleaved host language code for data processing. However, the host language features used for composition introduce substantial overhead, which leads to poor performance. In this paper, we present a technique to systematically eliminate this overhead. We use Scala macros to analyse the grammar specification at compile-time and remove composition, leaving behind an efficient top-down, recursive-descent parser. We compare our macro-based approach to a staging-based approach using the LMS framework, and provide an experience report in which we discuss the advantages and drawbacks of both methods. Our library outperforms Scala's standard parser combinators on a set of benchmarks by an order of magnitude, and is 2x faster than code generated by LMS

Infoscience - École polytechnique fédérale de Lausanne

Crossref

What are the Odds? Probabilistic programming in Scala

Author: Amin Nada
Jonnalagedda Manohar
Rompf Tiark
Stucki Sandro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/03/2014
Field of study

Probabilistic programming is a powerful high-level paradigm for probabilistic modeling and inference. We present Odds, a small domain-specific language (DSL) for probabilistic programming, embedded in Scala. Odds provides first-class support for random variables and probabilistic choice, while reusing Scala's abstraction and modularity facilities for composing probabilistic computations and for executing deterministic program parts. Odds accurately represents possibly dependent random variables using a probability monad that models committed choice. This monadic representation of probabilistic models can be combined with a range of inference procedures. We present engines for exact inference, rejection sampling and importance sampling with look-ahead, but other types of solvers are conceivable as well. We evaluate Odds on several non-trivial probabilistic programs from the literature and we demonstrate how the basic probabilistic primitives can be used to build higher-level abstractions, such as rule-based logic programming facilities, using advanced Scala features

Infoscience - École polytechnique fédérale de Lausanne

ExPASy: SIB bioinformatics resource portal

Author: Arnold Konstantin
Artimo Panu
Baratin Delphine
Csardi Gabor
de Castro Edouard
Duvaud Séverine
Flegel Volker
Fortier Arnaud
Gasteiger Elisabeth
Grosdidier Aurélien
Hernandez Céline
Ioannidis Vassilios
Jonnalagedda Manohar
Kuznetsov Dmitry
Liechti Robin
Moretti Sébastien
Mostaguir Khaled
Redaschi Nicole
Rossier Grégoire
Stockinger Heinz
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a ‘decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across ‘selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPAS

RERO DOC Digital Library

Optimizing data structures in high-level programs: New directions for extensible compilers based on staging

Author: Arvind K. Sujeeth
Hyoukjoong Lee
Kevin J. Brown
Kunle Olukotun
Manohar Jonnalagedda
Martin Odersky
Nada Amin
Tiark Rompf
Vojin Jovanovic
Publication venue: ACM Press
Publication date: 01/10/2013
Field of study

High level data structures are a cornerstone of modern programming and at the same time stand in the way of compiler optimizations. In order to reason about user or library-defined data structures, compilers need to be extensible. Common mechanisms to extend compilers fall into two categories. Frontend macros, staging or partial evaluation systems can be used to programmatically remove abstraction and specialize programs before they enter the compiler. Alternatively, some compilers allow extending the internal workings by adding new transformation passes at different points in the compile chain or adding new intermediate representation (IR) types. None of these mechanisms alone is sufficient to handle the challenges posed by high level data structures. This paper shows a novel way to combine them to yield benefits that are greater than the sum of the parts

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems

Author: Amin Nada
Brown Kevin J.
Dashti Mohammad
Jonnalagedda Manohar
Klonatos Yannis
Koch Christoph
Lee HyoukJoong
Ofenbeck Georg
Olukotun Kunle
Püschel Markus
Rompf Tiark
Stojanov Alen
Sujeeth Arvind K.
Publication venue: Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Publication date: 01/01/2015
Field of study

Most performance critical software is developed using very low-level techniques. We argue that this needs to change, and that generative programming is an effective avenue to enable the use of high-level languages and programming techniques in many such circumstances.ISSN:1868-896

Repository for Publications and Research Data

Dagstuhl Research Online Publication Server